Probabilistic Model Of Acoustic/Prosody/Concept Relationships For Speech Synthesis

نویسنده

  • Nanette M. Veilleux
چکیده

This paper describes the formalism for incorporating emerging linguistic theory in a joint model of the acoustic/prosody/concept relationships. It makes use of binary decision trees to estimate model parameters, the conditional probabilities. In doing so, the model remains general, and can accommodate the results of our evolving understanding of the interaction between factors that determine prosody. While this model has been successful in both speech synthesis and analysis applications, it has made use of syntactic and pragmatic information alone. Extension of this model to map prosodic structure to other higher order linguistic structures that more fully describe the meaning that an utterance is straightforward. As hypotheses are developed in the ranking of competing constraints, including focus structure, and in the role of discourse history, they can be integrated into the model as features in the binary decision tree. 1 I n t r o d u c t i o n Prosody, particularly the placement of phrasing and relatively prominent syllables within an utterance, is important in human understanding of speech. (Boogaartand Silverman, 1992; Price et.al., 1991) While great improvements in the prosody of synthetic speech have been made over the past decade, naturalness itself has proved elusive (Boogaartand Silverman, 1992; Veilleux, 1994). One reason for the remaining diflhrences between synthetic and human speech is an incomplete understanding of the mapping between the speaker's intended meaning and that meaning's acoustic consequences, which are, in part, encoded in the prosodic structure (Price et.al., 1991). Prosody, therefore, is an important part of the route from meaning to speech. In order to see how prosody can be improved in automatic speech synthesis systems, it is useful to examine what is known about the relationship between prosody and the acoustic speech signal on the one hand and between prosody and the meaning embedded in that speech on the other. Clearly, prosody is related to the acoustic speech signal. In human speech, prosodic phrases and prominence are cued by acoustic features such as f0 contour and duration. For example, final syllable lengthening and a descending f0 pattern on the word think (as well as know) will lead the listener to the perceive a phrase break between know and I in the underlined utterance: Don't you think Michael Jordan is great? I don't think; I know. Furthermore, this sentence might reasonably be produced with a pitch accent (e.g. an H*, or rise/fall f0 pattern) on the words think and know, lending the perception that these two words are more prominent than other words in the utterance and that they are being contrasted (Prevost, 1996). Prosody is also related to higher order linguistic structures such as the syntax of an utterance. In the above example, the main clauses [S I don't think] and [S I know] align with major prosodic phrase breaks. Other researchers, as well as myself, have gainfully used this prosody/syntax relationship in both speech synthesis and speech analysis (automatic recognition and understanding) applications. (e.g. (Veilleux, 1996; Wang and Hirschberg, 1992)). Also note that the same word string could be used to convey the opposite meaning, as in What's the capital of Sri Lanka? I don't think I know. However, this sentence does not have the same syntactic structure and one would not expect the same prosodic structure (e.g. there would be less of a "break" after t h ink in the second example). However, just as syntax is not fully determined by word choice or order, syntax is not the only factor that determines the prosodic structure. For example (from (Steedman, 1991)), the sentence [S [NP Mary] [VP prefers [NP corduroy]]] can be naturally produced with a major phrase break bisecting the verb phrase: (Mary prefers) (corduroy). It is reasonable to conclude that prosody is constrained by factors in addition to (and possibly in conflict with) syntax. The following example shows that semantic issues also play a role in determining

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Acoustic Study of Emotivity-Prosody Interface in Persian Speech Using the Tilt Model

This paper aims to explore some acoustic properties (i.e. duration and pitch amplitude of speech) associated with three different emotions: anger, sadness and joy against neutrality as a reference point, all being intentionally expressed by six Persian speakers. The primary purpose of this study is to find out if there is any correspondence between the given emotions and prosody patterning in P...

متن کامل

Recent improvements on Microsoft's trainable text-to-speech system-Whistler

Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data. This paper will focus on recent improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of...

متن کامل

Recent Improvements on Michael’s Trainable Sample Paper System - Whistle

Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data. This paper will focus on recent improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of...

متن کامل

Fluent speech prosody: Framework and modeling

The prosody of fluent connected speech is much more complicated than concatenating individual sentence intonations into strings. Prosody framework and modeling should base on more understanding of both the production and perception of fluent speech. We analyzed speech corpora of read Mandarin Chinese discourses from a top-down perspective on perceived units and boundaries, and consistently iden...

متن کامل

Duration, intensity and pause predictions in relation to prosody organization

Our research group has postulated a perceptually based multiphrase prosody framework for speech paragraphs in fluent speech using corporal analyses. The framework features a prosody hierarchy that organizes phrases and sentences into prosodic groups (PG) in connected speech, and specifies cross-phrase prosodic relationships in the acoustic domains [1, 2]. A corresponding fluent speech prosody m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997